Kalman Filter

## Loading required package: rjags
## Loading required package: coda
## Linked to JAGS 4.3.0
## Loaded modules: basemod,bugs

In this exercise we will apply the classic Kalman Filter (KF) algorithm to the Google Flu Trends data we previously used to explore the state-space model. Unlike the previous exercise that fit a single time-series, we’ll utilize the matrix version of the Kalman filter to look at the flu across New England.

In the multivariate version of the KF, the connection between state variables in the Analysis step is provided in two ways: (1) through interactions in the process model itself, \(MPM^T\), and (2) through the covariance in the process error, \(Q\). In this assignment we’ll assimilate all 4 combinations of with/without interactions in the process model versus with/without correlation in the process error to evaluate how each impacts the inferences made. Since the KF will always be following the data, where we’ll see the largest impact of these choices will be in the differences in the state uncertainties, especially in the periods of missing data.

To begin, let’s load and plot the flu data for New England.

## load the Google flu data & select states
gflu = read.csv("http://www.google.org/flutrends/about/data/flu/us/data.txt",skip=11)
time = as.Date(gflu$Date)
states = c("Massachusetts","Connecticut","Rhode.Island","New.Hampshire","Vermont","Maine")
nstates = length(states)
y = t(gflu[,states])

## plot time-series from states
plot(time,1:length(time),type='n',ylab="Flu Index",lwd=2,log='y',ylim=range(y,na.rm=TRUE))
for(i in 1:nstates){
  lines(time,y[i,],col=i,lwd=2)
}
legend("topleft",legend=states,lwd=2,col=1:nstates)

Kalman does not estimate parameters, so we will used parameters that were previously estimated by fitting a state space model to the data. In a real-world situation you wouldn’t fit two models to the same data (double dipping!), but rather you could fit a state-space model to the previous data and then use an operational forecast moving forward. Alternatively, you might augment the state matrix in the KF to include both the model states and the model parameters. However, for the classic KF, this approach is limited to only being able to estimate parameters that can be written as linear models of the augmented state + variable matrix M. Therfore, you are limited to estimating variables in the process model, f(X), not the parameters in the Observation Error or Process Error matrices. For the Kalman Filter exercise today we will be using estimates of these variance parameters, not the states, to inform the KF. Keep in mind that the KF is now treating these as KNOWN and thus ignoring parameter uncertainty.

In our previous model we assumed a Random Walk which we just fit Massachussetts. For this version we’ll keep working with a Random Walk but we’ll need to add a spatial contagious process to the random-walk process model. In other words, we’re positing that part of the reason that we see such strong correlations across-states is that infected individuals are able to move across state boundaries and infect individuals in adjacent states. To run such a model we’ll need to define a matrix that defines the adjacency between states, where 1 = adjacent, 0 = not adjacent, and the states are in the order: Massachusetts, Connecticut, Rhode.Island, New.Hampshire, Vermont, Maine.

## define adjacency between states slected
adj = matrix(c(0,1,1,1,1,0,    ### state-to-state spatial adjacency (self=0)
               1,0,1,0,0,0,
               1,1,0,0,0,0,
               1,0,0,0,1,1,
               1,0,0,1,0,0,
               0,0,0,1,0,0),nstates,nstates,byrow=TRUE)

To be more specific, lets assume a simple flux process just based on adjacency, and ignore differences in how population size, border length, transporation corridors, etc. affect the movement of individuals among the New England states.

\(X_{i,t+1} = X_{i,t} + \alpha*\sum(adj_{i,j}*(X_{j,t}-X_{i,t}))+\epsilon_{i,t}\)

Thus, if state j has more cases than state i, this will tend to increase infection in state i. For your reference, below is the JAGS model fit to the log-transformed flu data

SpatialRandomWalk = "
model{

  #### Data Model
  for(t in 1:n){
    for(i in 1:nstate){
      y[i,t] ~ dnorm(x[i,t],tau_obs)
    }
  }

  #### Process Model
  for(t in 2:n){
    for(i in 1:nstate){
      mu[i,t] <- x[i,t-1] + alpha * sum(adj[i,1:nstate]*x[1:nstate,t-1])
    }
    x[1:nstate,t] ~ dmnorm(mu[1:nstate,t],Omega_proc)
  }
  
  #### Priors
  for(i in 1:nstate){
    x[i,1] ~ dnorm(x_ic,tau_ic)
  }
  tau_obs ~ dgamma(a_obs,r_obs)
  Omega_proc ~ dwish(R,k)
  alpha ~ dbeta(1,20)
}
"

And the parameters estimated from the model

## load parameters (assume known)
load("data/KFalpha.params.Rdata")
## observation error
tau_obs
##     tau_obs 
## 0.005094734
## process error covariance
knitr::kable(tau_proc,col.names = states)
Massachusetts Connecticut Rhode.Island New.Hampshire Vermont Maine
0.0130656 0.0083375 0.0064081 0.0071263 0.0046522 0.0059065
0.0083375 0.0136074 0.0065342 0.0073813 0.0047711 0.0060086
0.0064081 0.0065342 0.0124185 0.0081431 0.0054649 0.0068645
0.0071263 0.0073813 0.0081431 0.0154499 0.0060621 0.0076979
0.0046522 0.0047711 0.0054649 0.0060621 0.0107488 0.0052666
0.0059065 0.0060086 0.0068645 0.0076979 0.0052666 0.0118565
## process error correlation
knitr::kable(cov2cor(tau_proc),col.names = states)
Massachusetts Connecticut Rhode.Island New.Hampshire Vermont Maine
1.0000000 0.6252902 0.5030751 0.5015754 0.3925707 0.4745606
0.6252902 1.0000000 0.5026571 0.5090742 0.3945061 0.4730532
0.5030751 0.5026571 1.0000000 0.5878879 0.4730061 0.5657167
0.5015754 0.5090742 0.5878879 1.0000000 0.4704107 0.5687673
0.3925707 0.3945061 0.4730061 0.4704107 1.0000000 0.4665257
0.4745606 0.4730532 0.5657167 0.5687673 0.4665257 1.0000000
## process error SD
sqrt(diag(tau_proc))
## [1] 0.1143047 0.1166508 0.1114383 0.1242976 0.1036765 0.1088874

Now that we have estimates for our parameters, let’s write functions that evaluates the classic Kalman Filter. Note, if you were running the KF in ‘operational’ mode, where new data is arriving in real time, you wouldn’t run the function all at once but rather just call the KalmanAnalysis every time new data is observed, followed by KalmanForecast to make a new forecast.

##'  Kalman Filter
##' @param  M   = model matrix
##' @param  mu0 = initial condition mean vector
##' @param  P0  = initial condition covariance matrix
##' @param  Q   = process error covariance matrix
##' @param  R   = observation error covariance matrix
##' @param  Y   = observation matrix (with missing values as NAs), time as col's
##'
##' @return list
##'  mu.f, mu.a  = state mean vector for (a)nalysis and (f)orecast steps
##'  P.f, P.a    = state covariance matrix for a and f
KalmanFilter <- function(M,mu0,P0,Q,R,Y){
  
  ## storage
  nstates = nrow(Y)  
  nt = ncol(Y)
  mu.f  = matrix(NA,nstates,nt+1)  ## forecast mean for time t
  mu.a  = matrix(NA,nstates,nt)  ## analysis mean for time t
  P.f  = array(NA,c(nstates,nstates,nt+1))  ## forecast variance for time t
  P.a  = array(NA,c(nstates,nstates,nt))  ## analysis variance for time t

  ## initialization
  mu.f[,1] = mu0
  P.f[,,1] = P0
  I = diag(1,nstates)

  ## run updates sequentially for each observation.
  for(t in 1:nt){

    ## Analysis step: combine previous forecast with observed data
    KA <- KalmanAnalysis(mu.f[,t],P.f[,,t],Y[,t],R,I)
    mu.a[,t] <- KA$mu.a
    P.a[,,t] <- KA$P.a
    
    ## Forecast step: predict to next step from current
    KF <- KalmanForecast(mu.a[,t],P.a[,,t],M,Q)
    mu.f[,t+1] <- KF$mu.f
    P.f[,,t+1] <- KF$P.f
  }
  
  return(list(mu.f=mu.f,mu.a=mu.a,P.f=P.f,P.a=P.a))
}

##' Kalman Filter: Analysis step
##' @param  mu.f = Forecast mean (vector)
##' @param  P.f  = Forecast covariance (matrix)
##' @param  Y    = observations, with missing values as NAs) (vector)
##' @param  R    = observation error covariance (matrix)
##' @param  H    = observation matrix (maps observations to states)
KalmanAnalysis <- function(mu.f,P.f,Y,R,H){
  obs = !is.na(Y) ## which Y's were observed?
  if(any(obs)){
    H <- H[obs,]                                              ## observation matrix
    K <- P.f %*% t(H) %*% solve(H%*%P.f%*%t(H) + R[obs,obs])  ## Kalman gain
    mu.a <- mu.f + K%*%(Y[obs] - H %*% mu.f)                  ## update mean
    P.a <- (1-K %*% H)*P.f                                    ## update covariance
  } else {
    ##if there's no data, the posterior is the prior
    mu.a = mu.f
    P.a = P.f
  }
  return(list(mu.a=mu.a,P.a=P.a))
}

##' Kalman Filter: Forecast Step
##' @param mu.a = analysis posterior mean (vector)
##' @param P.a  = analysis posterior covariance (matrix)
##' @param M    = model (matrix)
##' @param  Q   = process error covariance (matrix)
KalmanForecast <- function(mu.a,P.a,M,Q){
  mu.f = M%*%mu.a
  P.f  = Q + M*P.a*t(M)
  return(list(mu.f=mu.f,P.f=P.f))
}

With the Kalman Filter function defined, we need to define the inputs to the function. Note below that I’m using the variable KF00 to store the outputs, where I’m using 00 to indicate that this run was done with the defaults for both the process model and process error covariance. In the assignment below you will rerun this analysis under a number of alternatives varying the process error and the magnitude of spatial flux in the process model.

## log transform data
Y   = log10(y)

## options for process model 
alpha = 0       ## assume no spatial flux
#alpha = 0.05    ## assume a large spatial flux
M = adj*alpha + diag(1-alpha*apply(adj,1,sum))  ## random walk with flux

## options for process error covariance
Q = tau_proc            ## full covariance matrix
#Q = diag(diag(Q))       ## diagonal covariance matrix

## observation error covariance (assumed independent)  
R = diag(tau_obs,nstates) 

## prior on first step, initialize with long-term mean and covariance
mu0 = apply(Y,1,mean,na.rm=TRUE)
P0 = cov(t(Y),use="pairwise.complete.obs")

## Run Kalman Filter
KF00 = KalmanFilter(M,mu0,P0,Q,R,Y)

After running the Kalman Filter, we can visualize the outputs. The first set of figures below shows the posterior analysis for each state through time. The second set shows the forecast and analysis standard deviations change through time, indicating when there is missing data in green on the bottom of the plot. As you can see the missing data is not synchronous across states, but the mean of the Analysis is influenced by the across-state covariances.

attach(KF00)
nt = length(time)

### plot ANALYSIS mean & CI time-series
par(mfrow=c(3,1))
for(i in 1:6){
  ci = rbind(mu.a[i,]-1.96*sqrt(P.a[i,i,]),mu.a[i,]+1.96*sqrt(P.a[i,i,]))
  plot(time,mu.a[i,],ylim=range(ci,na.rm=TRUE),type='n',main=states[i])
  ecoforecastR::ciEnvelope(time,ci[1,],ci[2,],col="lightBlue")
  lines(time,mu.a[i,],col=4)
  lines(time,Y[i,])
}

## plot ANALYSIS and FORECAST variance time-series
par(mfrow=c(3,1))
for(i in 1:6){
  plot(time,sqrt(P.a[i,i,]),ylim=c(0,sqrt(max(c(P.a[i,i,],P.f[i,i,])))),main=states[i],xlab="Time",
       ylab="Std Error",type='l')
  lines(time,sqrt(P.f[i,i,1:nt]),col=2)
  points(time[is.na(Y[i,])],rep(0,nt)[is.na(Y[i,])],pch="*",col=3) ## flag's the zero's
  legend("topright",legend=c("Analysis","Forecast","NAs"),col=1:3,lty=c(1,1,NA),pch=c(NA,NA,1),cex=1.4)
}

Finally, to get a better idea about the dynamics of how the Kalman Filter works we can zoom in to a subset of time for one state and show the Forecast, Analysis, and observed data altogether.

## subset time
time2 <- time[time>as.Date("2015-01-01")]
tsel <- which(time %in% time2)
n = length(time2)*2

## interleave Forecast and Analysis
mu = p = rep(NA,n)
mu[seq(1,n,by=2)] = mu.f[1,tsel]
mu[seq(2,n,by=2)] = mu.a[1,tsel]
p[seq(1,n,by=2)]  = 1.96*sqrt(P.f[1,1,tsel])
p[seq(2,n,by=2)]  = 1.96*sqrt(P.a[1,1,tsel])
ci = cbind(mu-p,mu+p)
time3 = sort(c(time2,time2+1))

## plot Forecast, Analysis, and data
plot(time3,mu,ylim=range(ci),type='l')
ecoforecastR::ciEnvelope(time3,ci[,1],ci[,2],col="lightBlue")
lines(time3,mu,lwd=2)
points(time,Y[1,])

Assignment

The assignment is to run the KF under all four combinations of covariance in the process model versus process error and compare the results. In particular you’ll want to pay attention to the missing data at the beginning of the timeseries for some states. You’ll also want to comment on how spatial adjacency affects the confidence in the inferences (some states are more isolated than others) in the four different scenarios. Finally, you’ll want to note that the alpha estimated from the data itself (0.000209), is close to zero and thus our real forecast would be much more like our no-flux run than our high flux run.

KF01: Rerun with process error set to just the diagonal matrix of Q, compare the results with the original – what impact does including covariance in the process error have on the inference?

## options for process model 
alpha = 0       ## assume no spatial flux
#alpha = 0.05    ## assume a large spatial flux
M = adj*alpha + diag(1-alpha*apply(adj,1,sum))  ## random walk with flux

## options for process error covariance
#Q = tau_proc            ## full covariance matrix
Q = diag(diag(Q))       ## diagonal covariance matrix

## Run Kalman Filter
KF01 = KalmanFilter(M,mu0,P0,Q,R,Y)

attach(KF01)
## The following objects are masked from KF00:
## 
##     mu.a, mu.f, P.a, P.f
nt = length(time)

### plot ANALYSIS mean & CI time-series
par(mfrow=c(3,1))
for(i in 1:6){
  ci = rbind(mu.a[i,]-1.96*sqrt(P.a[i,i,]),mu.a[i,]+1.96*sqrt(P.a[i,i,]))
  plot(time,mu.a[i,],ylim=range(ci,na.rm=TRUE),type='n',main=states[i])
  ecoforecastR::ciEnvelope(time,ci[1,],ci[2,],col="lightBlue")
  lines(time,mu.a[i,],col=4)
  lines(time,Y[i,])
}

## plot ANALYSIS and FORECAST variance time-series
par(mfrow=c(3,1))
for(i in 1:6){
  plot(time,sqrt(P.a[i,i,]),ylim=c(0,sqrt(max(c(P.a[i,i,],P.f[i,i,])))),main=states[i],xlab="Time",
       ylab="Std Error",type='l')
  lines(time,sqrt(P.f[i,i,1:nt]),col=2)
  points(time[is.na(Y[i,])],rep(0,nt)[is.na(Y[i,])],pch="*",col=3) ## flag's the zero's
  legend("topright",legend=c("Analysis","Forecast","NAs"),col=1:3,lty=c(1,1,NA),pch=c(NA,NA,1),cex=1.4)
}

When alpha is set to 0, we are assuming certain correlations are set to zero, rather than using the calculated covariances of process error from the model (Box 13.1 from textbook). So when we run the model here, we are changing the covariance matrix to just the diagonals, but it doesn’t change the model - the covariances were already being ignored due to alpha = 0.

KF11: Rerun with alpha = 0.05 and the diagonal Q matrix. Comparing KF11 to KF01, what impact does including a spatial flux in the process model have on the inference?

## options for process model 
#alpha = 0       ## assume no spatial flux
alpha = 0.05    ## assume a large spatial flux
M = adj*alpha + diag(1-alpha*apply(adj,1,sum))  ## random walk with flux

## options for process error covariance
#Q = tau_proc            ## full covariance matrix
Q = diag(diag(Q))       ## diagonal covariance matrix

## Run Kalman Filter
KF11 = KalmanFilter(M,mu0,P0,Q,R,Y)

attach(KF11)
## The following objects are masked from KF01:
## 
##     mu.a, mu.f, P.a, P.f
## The following objects are masked from KF00:
## 
##     mu.a, mu.f, P.a, P.f
nt = length(time)

### plot ANALYSIS mean & CI time-series
par(mfrow=c(3,1))
for(i in 1:6){
  ci = rbind(mu.a[i,]-1.96*sqrt(P.a[i,i,]),mu.a[i,]+1.96*sqrt(P.a[i,i,]))
  plot(time,mu.a[i,],ylim=range(ci,na.rm=TRUE),type='n',main=states[i])
  ecoforecastR::ciEnvelope(time,ci[1,],ci[2,],col="lightBlue")
  lines(time,mu.a[i,],col=4)
  lines(time,Y[i,])
}

## plot ANALYSIS and FORECAST variance time-series
par(mfrow=c(3,1))
for(i in 1:6){
  plot(time,sqrt(P.a[i,i,]),ylim=c(0,sqrt(max(c(P.a[i,i,],P.f[i,i,])))),main=states[i],xlab="Time",
       ylab="Std Error",type='l')
  lines(time,sqrt(P.f[i,i,1:nt]),col=2)
  points(time[is.na(Y[i,])],rep(0,nt)[is.na(Y[i,])],pch="*",col=3) ## flag's the zero's
  legend("topright",legend=c("Analysis","Forecast","NAs"),col=1:3,lty=c(1,1,NA),pch=c(NA,NA,1),cex=1.4)
}

Including spatial flux makes a big difference in our models. Because information is shared based on distance, areas where we have missing data for a state don’t have error that balloons to 1 - rather, the error decreases when a nearby state has data that can constrain our estimates. This is most obvious for Vermont, whose error decreases as soon as data is collected for New Hampshire, an adjacent state. The forecast itself then fluctuates with the seasons as other state forecasts do, whereas previously it was flat during any missing data period.

KF10: Rerun with alpha = 0.05 and the original Q matrix. Compare KF10 to previous runs – what impact does including both a spatial process and a process error covariance have over their impacts individually.

## options for process model 
#alpha = 0       ## assume no spatial flux
alpha = 0.05    ## assume a large spatial flux
M = adj*alpha + diag(1-alpha*apply(adj,1,sum))  ## random walk with flux

## options for process error covariance
Q = tau_proc            ## full covariance matrix
#Q = diag(diag(Q))       ## diagonal covariance matrix

## Run Kalman Filter
KF10 = KalmanFilter(M,mu0,P0,Q,R,Y)

attach(KF10)
## The following objects are masked from KF11:
## 
##     mu.a, mu.f, P.a, P.f
## The following objects are masked from KF01:
## 
##     mu.a, mu.f, P.a, P.f
## The following objects are masked from KF00:
## 
##     mu.a, mu.f, P.a, P.f
nt = length(time)

### plot ANALYSIS mean & CI time-series
par(mfrow=c(3,1))
for(i in 1:6){
  ci = rbind(mu.a[i,]-1.96*sqrt(P.a[i,i,]),mu.a[i,]+1.96*sqrt(P.a[i,i,]))
  plot(time,mu.a[i,],ylim=range(ci,na.rm=TRUE),type='n',main=states[i])
  ecoforecastR::ciEnvelope(time,ci[1,],ci[2,],col="lightBlue")
  lines(time,mu.a[i,],col=4)
  lines(time,Y[i,])
}

## plot ANALYSIS and FORECAST variance time-series
par(mfrow=c(3,1))
for(i in 1:6){
  plot(time,sqrt(P.a[i,i,]),ylim=c(0,sqrt(max(c(P.a[i,i,],P.f[i,i,])))),main=states[i],xlab="Time",
       ylab="Std Error",type='l')
  lines(time,sqrt(P.f[i,i,1:nt]),col=2)
  points(time[is.na(Y[i,])],rep(0,nt)[is.na(Y[i,])],pch="*",col=3) ## flag's the zero's
  legend("topright",legend=c("Analysis","Forecast","NAs"),col=1:3,lty=c(1,1,NA),pch=c(NA,NA,1),cex=1.4)
}

Including both spatial flux and process error covariance improves our model over either included individually. Including process error covariance is important because these covariances were calculated from the model itself, so they should be propagated into forecasts, since they can improve the error around both missing and present data for each state. We see this when we add the full covariance matrix to the model that previously only had the spatial flux. In Maine, for example, the mean forecast and the confidence intervals both are better approximations of the peaks and valleys from later years once the process error covariance allows for subtle relationships between all other states (rather than only adjacent states). This might be more pronounced if the covariances were higher, but they’re close to zero, so overall we don’t see a huge effect on the standard error estimates.

If the process model included a strong environmental covariate that correlated with the seasonal oscillation of influenza, how would this impact the Forecast and Analysis? Comment on how this would affect mu.f, mu.a, P.f, P.a, Q, and R specifically.

If there were a strong environmental covariate that correlated with influenza, such as temperature, this could improve both the forecast and the analysis. The high predictability of temperature, along with possibly higher data resolution, can make mu.f (the forecast mean) more accurate and more precise (therefore decreasing P.f). This increased forecast accuracy/precision should mean that mu.a will be more similar to mu.f than to the data. The precision of the analysis will also increase.

The effect of a strong environmental covariate on Q would depend on the nature of the covariate. Since Q is the process error covariance, temperature could reduce process error of each state without necessarily reducing the error covariance between states; if, however, temperature explains the covariance between states, then the values in Q would decrease.

Observation error (R) should not be affected by a new covariate, unless it’s something that would affect our likelihood of observing infected individuals - i.e. maybe heat drives people to the hospital, so we are more likely to observe the individuals that are sick, whereas in colder temperatures they would stay home and not be observed.

Explain conceptually what would have to change in the Kalman Filter function if we replaced the current linear model with a nonlinear process model, in order to turn it into an Extended Kalman Filter function.

When converting the Kalman Filter to an Extended Kalman Filter, we need to change the way we update the forecast mean. Rather than using the fully nonlinear model to update the forecast variance, we approximate a linear model using a Taylor series approach, calculated from the matrix of all pairs of derivatives for a given timepoint. This adds another computational step, and we also have to incorporate another process variance parameter, which can determine the significance of initial condition error in shaping our forecasts. Though a bit more complex, the EKF is a useful tool because nonlinear models are often much more realistic approximations of natural phenomena.